Predicting Pronoun Translation Using Syntactic, Morphological and Contextual Features from Parallel Data
نویسنده
چکیده
We describe the systems submitted to the shared task on pronoun prediction organized within the Second DiscoMT Workshop. The systems are trained on linguistically motivated features extracted from both sides of an English-French parallel corpus and their parses. We have used a parser that integrates morphological disambiguation and which handles the REPLACE_XX placeholders explicitly. In particular, we compare the relevance of three groups of features: a) syntactic (from the English parse), b) morphological (from the French morphological analysis) and c) contextual (from the French sentence) for French pronoun prediction. A discussion on the role of these sets of features for each pronoun class is included.
منابع مشابه
Improving Pronoun Translation for Statistical Machine Translation (SMT)
Machine Translation is a well established field, yet the majority of current systems perform the translation of sentences in complete isolation, losing valuable contextual information from previously translated sentences in the discourse. One such class of contextual information concerns who or what it is that a reduced referring expression such as a pronoun is meant to refer to. The use of ina...
متن کاملBaseline Models for Pronoun Prediction and Pronoun-Aware Translation
This paper presents baseline models for the cross-lingual pronoun prediction task and the pronoun-focused translation task at DiscoMT 2015. We present simple yet effective classifiers for the former and discuss the impact of various contextual features on the prediction performance. In the translation task we rely on the document-level decoder Docent and a cross-sentence target language-model o...
متن کاملA Feature-rich Supervised Word Alignment Model for Phrase-based Statistical Machine Translation
Word alignment plays an important role in statistical machine translation (SMT) systems. The output of word alignment can be used to build a phrase table, which is the core model in the decoding of new sentences. Most current SMT systems use GIZA++, a generative model, to automatically align words from sentence-aligned parallel corpora. GIZA++ works well when large sentence-aligned corpora are ...
متن کاملGenerating Complex Morphology for Machine Translation
We present a novel method for predicting inflected word forms for generating morphologically rich languages in machine translation. We utilize a rich set of syntactic and morphological knowledge sources from both source and target sentences in a probabilistic model, and evaluate their contribution in generating Russian and Arabic sentences. Our results show that the proposed model substantially...
متن کاملMultilingual Aligned Parallel Treebank Corpus Reflecting Contextual Information And Its Applications
This paper describes Japanese-English-Chinese aligned parallel treebank corpora of newspaper articles. They have been constructed by translating each sentence in the Penn Treebank and the Kyoto University text corpus into a corresponding natural sentence in a target language. Each sentence is translated so as to reflect its contextual information and is annotated with morphological and syntacti...
متن کامل